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i Improved search strategy for large vocabulary continuous Mandarin speech recognition 

Tai-Hsuan Ho; Kae-Cherng Yang; Kuo-Hsun Huang; Lin-Shan Lee; 

Acoustics, Speech, and Signal Processing, 1998. ICASSP '98. Proceedings of the 1998 IEEE International 
Conference on , Volume: 2 , 12-15 May 1998 
Page(s): 825-828 vol.2 
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2 A novel statistical language modelling method for continuous Chinese speech recognition 

Tian Bin; Van Hongxin; Fu Qtang; Yi Kechu; 

Signal Processing Proceedings, 1998. ICSP '98. 1998 Fourth International Conference on , 1998 
Page(s): 734-737 vol.1 
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Technique for automatically correcting words in text 
Karen Kukich 

ACM Computing Surveys (CSUR) December 1992 
Volume 24 Issue 4 

Research aimed at correcting words in text has focused on three progressively more difficult problems: (1) nonword 
error detection; (2) isolated-word error correction; and (3) context-dependent work correction. In response to the 
first problem, efficient pattern-matching and n-gram analysis techniques have been developed for detecting strings 
that do not appear in a given word list. In response to the second problem, a variety of general and 
application-specific spelling cor ... 
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2 Queries: Web question answering: is more always better? 
□j Susan Dumais , Michele Banko , Eric Brill , Jimmy Lin , Andrew Ng 

— Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval 
August 2002 

This paper describes a question answering system that is designed to capitalize on the tremendous amount of data 
that is now available online. Most question answering systems use a wide variety of linguistic resources. We focus 
instead on the redundancy available in large corpora as an important resource. We use this redundancy to simplify 
the query rewrites that we need to use, and to support answer mining from returned snippets. Our system performs 
quite well given the simplicity of the techni ... 
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3 Evaluation of a simple and effective music information retrieval method 
rft Stephen Downie , Michael Nelson 

Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information 

Write: AOuly 2000 

We developed, and then evaluated, a music information retrieval (MIR) system based 
upon the intervals found within the melodies of a collection of 9354 folksongs. The songs 
were converted to an interval-only representation of monophonic melodies and then 
fragmented t into length-n subsections called n-grams. The length of these n-grams and 
the degree to which we precisely represent the intervals are variables analyzed in this 
paper. We constructed a collection of “musical word” da ... 
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4 Using n-grams for Korean text retrieval 
r>ft Joo Ho Lee , Jeong Soo Ahn 

Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval 

August 1996 
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5 Comparison of word-based and syllable-based retrieval for Tibetan (poster session) 91% 
□] Paul G. Hackett , Douglas W. Oard 

Proceedings of the fifth international workshop on on Information retrieval with Asian languages November 2000 

Tibetan retrieval based on automatically segmented words is compared with the use of 
overlapping syllable n-grams using a known-item retrieval evaluation. The optimal span of 
fixed-length n-grams is found to be 2 syllables, and indexing words is found to be as 
effective as indexing syllable bigrams. 



6 Spoken dialogue technology: enabling the conversational user interface 
□) ACM Computing Surveys (CSUR) March 2002 
— Volume 34 Issue 1 

Spoken dialogue systems allow users to interact with computer-based applications such as databases and expert 
systems by using natural spoken language. The origins of spoken dialogue systems can be traced back to Artificial 
Intelligence research in the 1950s concerned with developing conversational interfaces. However, it is only within the 
last decade or so, with major advances in speech technology, that large-scale working systems have been developed 
and, in some cases, introduced into commerc ... 
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7 Information extraction for Thai documents 85% 
Rattasit Sukhahuta , Dan Smith 

— Proceedings of the fifth international workshop on on Information retrieval with Asian languages November 2000 

An increasing amount of electronically available information is stored in Asian language 
documents, which makes Information Retrieval (IR) and Information Extraction (IE) for 
these languages important for a large number of users. Analysis and extraction of 
information in these languages presents several interesting problems not seen in Western 
European languages; these are interesting in their own right and for the insights they can 
give into more general IR and IE techniques. We describe the ... 

8 Content-based language models for spoken document retrieval 82% 
□) Hsin-min Wang , Berlin Chen 

— Proceedings of the fifth international workshop on on Information retrieval with Asian languages November 2000 

Spoken document retrieval (SDR) has been extensively studied in recent years because of 
its potential use in navigating large multimedia collections in the near future. This paper 
presents a novel concept of applying the content-based language models to spoken 
document retrieval. In an example task for retrieval of Mandarin broadcast news, the 
content-based language models either trained with the automatic transcriptions of the 
spoken documents or adapted from the baseline language models usi ... 

9 Building and using cultural digital libraries: Supporting access to large digital oral history archives 80% 
C|j Samuel Gustman , Dagobert Soergel , Douglas Oard , William Byrne , Michael Picheny , Bhuvana Ramabhadran , Douglas 

— Green berg 

Proceeding of the second ACM/IEEE-CS joint conference on Digital libraries July 2002 

This paper describes our experience with the creation, indexing, and provision of access to a very large archive of 
videotaped oral histories - 116,000 hours of digitized interviews in 32 languages from 52,000 survivors, liberators, 
rescuers, and witnesses of the Nazi Holocaust. It goes on to identify a set of critical research issues that must be 
addressed if we are to provide full and detailed access to collections of this size: issues in user requirement studies, 
automatic speech recognition, ... 

10 Query expansion using phonetic confusions for Chinese spoken document retrieval 80% 
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□j Yuk-Chi Li , Wai-Kit Lo , Helen M. Meng , P. C. Ching 

Proceedings of the fifth international workshop on on Information retrieval with Asian languages November 2000 

This paper presents a method of query expansion based on phonetic confusions for 
retrieving spoken documents using text queries. This method is applied to a Chinese 
spoken document retrieval task. A series of experiments have been carried out for 
Cantonese broadcast news documents using a multi-scale syllable-based retrieval 
approach. Our results show an improvement from AIR (average inverse rank) of 0.481 to 
0.491 when we apply query expansion based on phonetic confusions to our retrieval ta ... 

11 Music digital libraries: Enhancing access to the levy sheet music collection: reconstructing full-text lyrics from 77% 
syllables 

Brian Wingenroth , Mark Patton , Tim DiLauro 

Proceeding of the second ACM/IEEE-CS joint conference on Digital libraries July 2002 

The goal of the Lester S. Levy Sheet Music Collection, Phase Two project is to develop tools, processes, and systems 
that facilitate collection ingestion through automated processes that reduce, but not necessarily eliminate human 
intervention^]. One of the major components of this project is an optical music recognition (OMR) system[2] that 
extracts musical information and lyric text from the page images that comprise each piece in a collection. It is often 
the case, as it is with the Levy Col ... 

12 Spoken content metadata and MPEG-7 77% 
□j J. P. A. Charlesworth , P. N. Garner 

Proceedings of the 2000 ACM workshops on Multimedia November 2000 

The words spoken in an audio stream form an obvious descriptor essential to most 
audio-visual metadata standards. When derived using automatic speech recognition 
systems, the spoken content fits into neither low-level (representative) nor high-level 
(semantic) metadata categories. This results in difficulties in creating a representation 
that can support both interoperability between different extraction and application utilities 
while retaining robustness to the limitations of the extraction ... 

13 Two approaches for the resolution of word mismatch problem caused by English words and foreign words in 77% 
Korean information retrieval 

Byung-Ju Kang , Key-Sun Choi 

Proceedings of the fifth international workshop on on Information retrieval with Asian languages November 2000 



4 of 5 



2/28/03 1:09 PM 



Results http://portal^m.OTa/res^ 




In Korean text, recently, the use of English words with or without phonetic translation is 
growing at high speed. To make matters worse the Korean transliterations of an English 
word may be very various. The mixed use of English words and their various 
transliterations may cause severe word mismatch problem in Korean information retrieval 
There can be two possible approaches, transliteration and back-transliteration method, to 
tackle the problem. We argue that our newly proposed transliterat ... 



14 A practical query-by-humming system for a large music database 77% 
Naoko Kosugi , Yuichi Nishihara , Tetsuo Sakata , Masashi Yamamuro , Kazuhiko Kushima 
Proceedings of the eighth ACM international conference on Multimedia October 2000 

A music retrieval system that accepts hummed tunes as queries is described in this paper. 
This system uses similarity retrieval because a hummed tune may contain errors. The 
retrieval result is a list of song names ranked according to the closeness of the match. 
Our ultimate goal is that the correct song should be first on the list. This means that 
eventually our system's similarity retrieval should allow for only one correct answer. 



The most significant improvement our system has ove ... 



15 Is Huffman coding dead? (extended abstract) 77% 
Q) Abraham Bookstein , Shmuel T. Klein , Timo Raita 

Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval 

July 1993 
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